Contextual Embeddings-Based Web Page Categorization Using the Fine-Tune BERT Model

نویسندگان

چکیده

The World Wide Web has revolutionized the way we live, causing number of web pages to increase exponentially. provides access a tremendous amount information, so it is difficult for internet users locate accurate and useful information on web. In order categorize accurately based queries users, methods categorizing need be developed. text content plays significant role in categorization pages. If word’s position altered within sentence, change interpretation that this phenomenon called polysemy. page categorization, polysemy property causes ambiguity referred as problem. This paper proposes fine-tuned model solve problem, using contextual embeddings created by symmetry multi-head encoder layer Bidirectional Encoder Representations from Transformers (BERT). effectiveness proposed was evaluated benchmark datasets i.e., WebKB DMOZ. Furthermore, experiment series also model’s hyperparameters achieve 96.00% 84.00% F1-Scores, respectively, demonstrating importance compared baseline approaches machine learning deep learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web Page Categorization Using Artificial Neural Networks

Web page categorization is one of the challenging tasks in the world of ever increasing web technologies. There are many ways of categorization of web pages based on different approach and features. This paper proposes a new dimension in the way of categorization of web pages using artificial neural network (ANN) through extracting the features automatically. Here eight major categories of web ...

متن کامل

Contextual object categorization with energy-based model

Object categorization is a hot issue of an image mining. Contextual information between objects is one of the important semantic knowledge of an image. However, the previous researches for an object categorization have not made full use of the contextual information, especially the spatial relations between objects. In addition, the object categorization methods, which generally use the probabi...

متن کامل

Web Page Categorization using Multilayer Perceptron with Reduced Features

The web is a huge repository of knowledge and numerous hyperlinks. Web also serves a broad diversity of user communities and global information service centers. Every day the knowledge in web page upwards rapidly. Web pages can be used to convey the knowledge to web users. Such voluminous size of the web makes an intricacy of web information retrieval, web content filtering and web structure mi...

متن کامل

DISTRIBUTED APPROACH to WEB PAGE CATEGORIZATION USING MAP- REDUCE PROGRAMMING MODEL

The web is a large repository of information and to facilitate the search and retrieval of pages from it, categorization of web documents is essential. An effective means to handle the complexity of information retrieval from the internet is through automatic classification of web pages. Although lots of automatic classification algorithms and systems have been presented, most of the existing a...

متن کامل

A Novel Web Page Categorization Algorithm Based on Block Propagation Using Query-Log Information

Most existing web page classification algorithms, including contentbased, link-based, or query-log analysis methods, treat the pages as smallest units. However, web pages usually contain some noisy or biased information which could affect the performance of classification. In this paper, we propose a Block Propagation Categorization (BPC) algorithm which deep mines web structure and views block...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Symmetry

سال: 2023

ISSN: ['0865-4824', '2226-1877']

DOI: https://doi.org/10.3390/sym15020395